Accurate shade matching is essential in restorative and prosthetic dentistry yet remains difficult due to subjectivity in visual assessments. We develop and evaluate a deep learning approach for the simultaneous segmentation of natural teeth and shade guides in intraoral photographs using four fine-tuned variants of Segment Anything Model 2 (SAM2: tiny, small, base plus, and large) and a UNet baseline trained under the same protocol. The spatial performance was assessed using the Dice Similarity Coefficient (DSC), the Intersection over the Union (IoU), and the 95th-percentile Hausdorff distance normalized by the ground-truth equivalent diameter (HD95). The color consistency within masks was quantified by the coefficient of variation (CV) of the CIELAB components (L*, a*, b*). The perceptual color difference was measured using CIEDE2000 (ΔE00). On a held-out test set, all SAM2 variants achieved a high overlap accuracy; SAM2-large performed best (DSC: 0.987 ± 0.006; IoU: 0.975 ± 0.012; HD95: 1.25 ± 1.80%), followed by SAM2-small (0.987 ± 0.008; 0.974 ± 0.014; 2.96 ± 11.03%), SAM2-base plus (0.985 ± 0.011; 0.971 ± 0.021; 1.71 ± 3.28%), and SAM2-tiny (0.979 ± 0.015; 0.959 ± 0.028; 6.16 ± 11.17%). UNet reached a DSC = 0.972 ± 0.020, an IoU = 0.947 ± 0.035, and an HD95 = 6.54 ± 16.35%. The CV distributions for all of the prediction models closely matched the ground truth (e.g., GT L*: 0.164 ± 0.040; UNet: 0.144 ± 0.028; SAM2-small: 0.164 ± 0.038; SAM2-base plus: 0.162 ± 0.039). The full-mask ΔE00 was low across models, with the summary statistics reported as the median (mean ± SD): UNet: 0.325 (0.487 ± 0.364); SAM2-tiny: 0.162 (0.410 ± 0.665); SAM2-small: 0.078 (0.126 ± 0.166); SAM2-base plus: 0.072 (0.198 ± 0.417); SAM2-large: 0.065 (0.167 ± 0.257). These ΔE00 values lie well below the ≈1 just noticeable difference threshold on average, indicating close chromatic agreement between the predictions and annotations. Within a single dataset and training protocol, fine-tuned SAM2, especially its larger variants, provides robust spatial accuracy, boundary reliability, and color fidelity suitable for clinical shade-matching workflows, while UNet offers a competitive convolutional baseline. These results indicate technical feasibility rather than clinical validation; broader baselines and external, multi-center evaluations are needed to determine its suitability for routine shade-matching workflows.
Loading....